Supervised and unsupervised PCFG adaptation to novel domains

نویسندگان

  • Brian Roark
  • Michiel Bacchiani
چکیده

This paper investigates adapting a lexicalized probabilistic context-free grammar (PCFG) to a novel domain, using maximum a posteriori (MAP) estimation. The MAP framework is general enough to include some previous model adaptation approaches, such as corpus mixing in Gildea (2001), for example. Other approaches falling within this framework are more effective. In contrast to the results in Gildea (2001), we show F-measure parsing accuracy gains of as much as 2.5% for high accuracy lexicalized parsing through the use of out-of-domain treebanks, with the largest gains when the amount of indomain data is small. MAP adaptation can also be based on either supervised or unsupervised adaptation data. Even when no in-domain treebank is available, unsupervised techniques provide a substantial accuracy gain over unadapted grammars, as much as nearly 5% F-measure improvement.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Unsupervised Domain Adaptation for Image Classification via Low Rank Representation Learning

Domain adaptation is a powerful technique given a wide amount of labeled data from similar attributes in different domains. In real-world applications, there is a huge number of data but almost more of them are unlabeled. It is effective in image classification where it is expensive and time-consuming to obtain adequate label data. We propose a novel method named DALRRL, which consists of deep ...

متن کامل

Use of unsupervised word classes for entity recognition: Application to the detection of disorders in clinical reports

Unsupervised word classes induced from unannotated text corpora are increasingly used to help tasks addressed by supervised classification, such as standard named entity detection. This paper studies the contribution of unsupervised word classes to a medical entity detection task with two specific objectives: How do unsupervised word classes compare to available knowledge-based semantic classes...

متن کامل

Cache-based Dynamic PCFG Adaptation using MAP Estimation

This paper presents a cache-based dynamic adaptation technique for lexicalized probabilistic context-free-grammar (LPCFG). Expected counts from machine-parsed sentences of in-domain data are stored in a cache, which are combined with prior counts from hand-annotated parses of outof-domain data using maximum a posteriori (MAP) estimation. This adaptation is unsupervised, and dynamic with an adap...

متن کامل

MAP adaptation of stochastic grammars

This paper investigates supervised and unsupervised adaptation of stochastic grammars, including ngram language models and probabilistic context-free grammars (PCFGs), to a new domain. It is shown that the commonly used approaches of count merging and model interpolation are special cases of a more general maximum a posteriori (MAP) framework, which additionally allows for alternate adaptation ...

متن کامل

Adaptive Pattern Recognition to Ensure Clinical Viability over Time

Pattern Recognition is a useful tool for deciphering movement intent from myoelectric signals. In order to be clinically viable over time, recognition paradigms must be capable of adapting with the user. Most existing paradigms are static, although two forms of adaptation have received limited attention: Supervised adaptation achieves high accuracy, since the intended class is known, but at the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003